fix(router): use max_completion_tokens for OpenAI GPT-5+ validation by cluster2600 · Pull Request #575 · NVIDIA/OpenShell

cluster2600 · 2026-03-24T20:57:17Z

Summary

Resolves #517 — openshell inference set fails for OpenAI GPT-5 models because the validation probe sends the deprecated max_tokens parameter, which GPT-5+ rejects with HTTP 400.

Send max_completion_tokens as the primary parameter in the OpenAI chat completions validation probe
Automatically fall back to max_tokens when the backend returns HTTP 400 (for legacy or self-hosted backends)
Extract try_validation_request() helper to avoid duplicating the request/response classification logic

Root Cause

OpenAI introduced max_completion_tokens as a replacement for max_tokens starting with the o1 series. GPT-5 and later models reject max_tokens entirely, returning HTTP 400. The validation probe only sent max_tokens, so inference setup would fail for any GPT-5+ model even though the endpoint was perfectly healthy.

graph TD
    subgraph "Before (broken)"
        A["validation_probe()"] -->|"max_tokens: 32"| B[OpenAI API]
        B -->|"HTTP 400: unsupported parameter"| C["ValidationFailure ❌"]
    end

    subgraph "After (fixed)"
        D["validation_probe()"] -->|"max_completion_tokens: 32"| E[OpenAI API]
        E -->|"HTTP 200"| F["ValidatedEndpoint ✅"]
        E -->|"HTTP 400"| G{fallback_body?}
        G -->|"yes"| H["retry with max_tokens: 32"]
        H -->|"HTTP 200"| I["ValidatedEndpoint ✅"]
        G -->|"no"| J["ValidationFailure ❌"]
    end

Changes

File	Change
`crates/openshell-router/src/backend.rs`	Add `fallback_body` field to `ValidationProbe`; update `openai_chat_completions` probe to use `max_completion_tokens` with `max_tokens` fallback; extract `try_validation_request()` helper; add 3 new tests
`crates/openshell-server/src/inference.rs`	Update existing test expectation from `max_tokens` to `max_completion_tokens`

Test Plan

cargo test -p openshell-router — 11 passed, 0 failed
New test: verify_openai_chat_uses_max_completion_tokens — primary probe succeeds with max_completion_tokens
New test: verify_openai_chat_falls_back_to_max_tokens — HTTP 400 on primary triggers retry with max_tokens
New test: verify_non_chat_completions_no_fallback — non-chat protocols (e.g. anthropic_messages) do not retry on 400

sequenceDiagram
    participant CLI as openshell inference set
    participant Router as Privacy Router
    participant Backend as OpenAI API

    CLI->>Router: verify_backend_endpoint()
    Router->>Backend: POST /v1/chat/completions<br/>{"max_completion_tokens": 32}

    alt GPT-5+ model
        Backend->>Router: HTTP 200
        Router->>CLI: ValidatedEndpoint ✅
    else Legacy backend
        Backend->>Router: HTTP 400 (unknown param)
        Router->>Backend: POST /v1/chat/completions<br/>{"max_tokens": 32}
        Backend->>Router: HTTP 200
        Router->>CLI: ValidatedEndpoint ✅
    end

github-actions · 2026-03-24T21:03:51Z

All contributors have signed the DCO ✍️ ✅
_{Posted by the DCO Assistant Lite bot.}

github-actions · 2026-03-24T21:03:51Z

Thank you for your interest in contributing to OpenShell, @cluster2600.

This project uses a vouch system for first-time contributors. Before submitting a pull request, you need to be vouched by a maintainer.

To get vouched:

Open a Vouch Request discussion.
Describe what you want to change and why.
Write in your own words — do not have an AI generate the request.
A maintainer will comment /vouch if approved.
Once vouched, open a new PR (preferred) or reopen this one after a few minutes.

See CONTRIBUTING.md for details.

…robe OpenAI GPT-5 models reject the legacy max_tokens parameter and require max_completion_tokens. The inference validation probe now sends max_completion_tokens as the primary parameter, with an automatic fallback to max_tokens when the backend returns HTTP 400 (for legacy/self-hosted backends that only support the older parameter). Closes NVIDIA#517 Signed-off-by: Maxime Grenu <maxime.grenu@gmail.com>

cluster2600 requested a review from a team as a code owner March 24, 2026 20:57

github-actions bot closed this Mar 24, 2026

pimlock reopened this Mar 25, 2026

cluster2600 force-pushed the fix/517-max-completion-tokens branch from 3c89e9b to 44217f7 Compare March 25, 2026 18:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(router): use max_completion_tokens for OpenAI GPT-5+ validation#575

fix(router): use max_completion_tokens for OpenAI GPT-5+ validation#575
cluster2600 wants to merge 1 commit intoNVIDIA:mainfrom
cluster2600:fix/517-max-completion-tokens

cluster2600 commented Mar 24, 2026

Uh oh!

github-actions bot commented Mar 24, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Mar 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

cluster2600 commented Mar 24, 2026

Summary

Root Cause

Changes

Test Plan

Uh oh!

github-actions bot commented Mar 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Mar 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

github-actions bot commented Mar 24, 2026 •

edited

Loading